Add support for CUDA >= 12.9 #757

Kh4L · 2025-07-07T02:33:41Z

Recent CUDA versions don't support non-context NPP calls, so use the ctx-based API calls.
Also CUDA 12.9+ deprecates nppGetStreamContext, so we need to build the NPP context manually.

NicolasHug

Thanks a lot for sending this PR @Kh4L, this is very helpful.

I applied our linter and also enabled testing for CUDA 12.9 so that we can correctly check the PR.

Do I understand correctly that in 12.9 we have to use the context-based API, while at the same time the context creation helper was removed?! This sounds error prone, is there any way we could avoid manually building and setting the context attributes?

src/torchcodec/_core/CudaDeviceInterface.cpp

Kh4L · 2025-07-10T02:05:57Z

@NicolasHug LMK if you need anything else!

the assertion error doesn't seem related to my change

NicolasHug · 2025-07-10T08:51:25Z

Nothing else to do on your side @Kh4L , thank you. I'll merge this soon, I'll just try to extract all the #ifdef stuff into their own single function before merging.

src/torchcodec/_core/CudaDeviceInterface.cpp

for later.

…e. Leaving" This reverts commit 7056fc0.

NicolasHug · 2025-07-30T08:26:07Z

Let me try to summarize the changes and add context for other reviewers, and for future reference.

From their release notes, CUDA 12.9 deprecates:

calls to non-context APIs. We now need to use nppiNV12ToRGB_8u_P2C3R_Ctx instead of nppiNV12ToRGB_8u_P2C3R
the nppGetStreamContext() helper, which we were previously using to create an NppContext. They explicitly recommend to manually create such context, as done here: https://github.com/NVIDIA/CUDALibrarySamples/tree/master/NPP#example-using-application-managed-context.

Removing the non-context APIs makes sense to me. I don't understand the logic behind removing nppGetStreamContext(). And I'm not the only one. But OK.

We now have to manually create the NppContext, which is the bullk of this PR. Note that opencv is doing something very similar as well: https://github.com/opencv/opencv/pull/27288/files.

This reverts commit c27d4b5.

…ce. Leaving" This reverts commit 9ddc670.

NicolasHug · 2025-07-30T08:34:09Z

src/torchcodec/_core/CudaDeviceInterface.cpp

+  // NppStreamContext hStream and nStreamFlags should not be part of the cache
+  // because they may change across calls.
+  NppStreamContext nppCtx = createNppStreamContext(
+      static_cast<int>(getFFMPEGCompatibleDeviceIndex(device_)));


A note on the cache: I originally implemented the "cache" as a simple nppCtx_ attribute on the CudaDeviceInterface class: 565896e (#757)

But I don't think that would be correct: the CudaDeviceInterface instance is global, and we only have one single instance for all CUDA devices. And we can't use one single NppContext for all CUDA devices - we need one NppContext per device.

So, we need a per-device cache for the NppContext, similar to our existing hw_device_ctx cache. I'm leaving that for an immediate follow-up.

NicolasHug · 2025-07-30T08:35:06Z

src/torchcodec/_core/CudaDeviceInterface.cpp

@@ -265,37 +303,37 @@ void CudaDeviceInterface::convertAVFrameToFrameOutput(
    dst = allocateEmptyHWCTensor(height, width, device_);
  }

-  // Use the user-requested GPU for running the NPP kernel.
-  c10::cuda::CUDAGuard deviceGuard(device_);


This guard isn't needed anymore as we now explicitly pass the current device to the NppContext creation.

NicolasHug · 2025-07-30T08:35:58Z

src/torchcodec/_core/CudaDeviceInterface.cpp

-  at::cuda::CUDAStream nppStreamWrapper =
-      c10::cuda::getStreamFromExternal(nppGetStream(), device_.index());
-  nppDoneEvent.record(nppStreamWrapper);
-  nppDoneEvent.block(at::cuda::getCurrentCUDAStream());


These syncs aren't needed anymore because we now explicitly ask Npp to rely on pytorch's current stream.

Update NPP calls for CUDA >= 12.9

886f64a

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 7, 2025

Add testing against CUDA 12.9

6023ea3

NicolasHug mentioned this pull request Jul 7, 2025

Add testing against CUDA 12.9 #758

Closed

NicolasHug added 2 commits July 7, 2025 12:39

Linter

d66cb33

Merge branch 'cuda129' into cuda129_update

69753e0

NicolasHug reviewed Jul 7, 2025

View reviewed changes

src/torchcodec/_core/CudaDeviceInterface.cpp Show resolved Hide resolved

NicolasHug reviewed Jul 11, 2025

View reviewed changes

src/torchcodec/_core/CudaDeviceInterface.cpp Outdated Show resolved Hide resolved

NicolasHug and others added 10 commits July 12, 2025 11:33

Move nppContext creation into separate function. Also rely on device_

ecf01a9

Use cache for nppContext object

565896e

Add maybe_unused

9a6d3d3

Pass positive index

2d681ad

Merge branch 'main' of github.com:pytorch/torchcodec into cuda129_update

4868724

Try manual creation for all CUDA versions

320c060

remove cache, it should be per-device not per decoder instance. Leaving

7056fc0

for later.

Revert "remove cache, it should be per-device not per decoder instanc…

9ddc670

…e. Leaving" This reverts commit 7056fc0.

Merge branch 'main' of github.com:pytorch/torchcodec into cuda129_update

8853ec8

Remove deviceGuard

c27d4b5

NicolasHug changed the title ~~Adapt NPP calls for CUDA >= 12.9~~ Add support for CUDA >= 12.9 Jul 30, 2025

NicolasHug added 3 commits July 30, 2025 09:26

Revert "Remove deviceGuard"

5306ca4

This reverts commit c27d4b5.

Reapply "remove cache, it should be per-device not per decoder instan…

b454c0c

…ce. Leaving" This reverts commit 9ddc670.

remove comment

8cedbde

NicolasHug reviewed Jul 30, 2025

View reviewed changes

NicolasHug approved these changes Jul 30, 2025

View reviewed changes

Dan-Flores approved these changes Jul 30, 2025

View reviewed changes

NicolasHug merged commit ee42162 into pytorch:main Jul 30, 2025
44 of 45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for CUDA >= 12.9 #757

Add support for CUDA >= 12.9 #757

Uh oh!

Kh4L commented Jul 7, 2025

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

Kh4L commented Jul 10, 2025

Uh oh!

NicolasHug commented Jul 10, 2025

Uh oh!

Uh oh!

NicolasHug commented Jul 30, 2025

Uh oh!

NicolasHug Jul 30, 2025

Uh oh!

NicolasHug Jul 30, 2025

Uh oh!

NicolasHug Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Add support for CUDA >= 12.9 #757

Add support for CUDA >= 12.9 #757

Uh oh!

Conversation

Kh4L commented Jul 7, 2025

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kh4L commented Jul 10, 2025

Uh oh!

NicolasHug commented Jul 10, 2025

Uh oh!

Uh oh!

NicolasHug commented Jul 30, 2025

Uh oh!

NicolasHug Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!